Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow: Simplify references to access paths from prior stage. #18258

Merged
merged 14 commits into from
Dec 11, 2024

Conversation

aschackmull
Copy link
Contributor

We carry ApApprox columns through a lot of predicates for a number of joins, which then might filter a little bit more, but ultimately most of this filtering will be achieved by the main filtering in fwdFlow1 where we join with PrevStage::revFlow.

MRVA testing shows that the number of additional tuples calculated in fwdFlow+revFlow is completely negligible by removing these early joins. The worst case I found yielded an additional 0.1% tuples, but all other cases had much smaller increases. So it's quite plausible that the additional data in all of these columns cost more performance than they provide. And if so, this will be a nice simplification.

Commit-by-commit review is recommended.

@aschackmull aschackmull requested a review from a team as a code owner December 10, 2024 13:52
@github-actions github-actions bot added the Java label Dec 10, 2024
@aschackmull
Copy link
Contributor Author

Dca is looking good - fairly uneventful. There doesn't appear to be any regressions, and there's possibly a few small performance improvements.

@hvitved hvitved added the no-change-note-required This PR does not need a change note label Dec 11, 2024
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification 🎉 . Two questions.

exists(boolean emptyAp |
flowIntoCallApaTaken(call, c, arg, p, emptyAp) and
fwdFlow(arg, _, _, _, _, ap, _, _) and
if ap instanceof ApNil then emptyAp = true else emptyAp = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is repeated in many places, so perhaps move out into a separate predicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also took the opportunity to rename a couple of predicates that were now somewhat misleading as their name included reference to the removed ApApprox column.

@@ -171,10 +171,12 @@ edges
| B.java:154:17:154:28 | source(...) : String | B.java:175:5:175:6 | String s2 : String | provenance | |
| B.java:158:7:158:13 | parameter this : MyLocal [String s1] : String | B.java:159:18:159:19 | this : MyLocal [String s1] : String | provenance | |
| B.java:158:7:158:13 | parameter this : MyLocal [String s2] : String | B.java:160:14:160:15 | this : MyLocal [String s2] : String | provenance | |
| B.java:158:7:158:13 | parameter this : MyLocal [String s2] : String | B.java:160:14:160:15 | this : MyLocal [String s2] : String | provenance | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these test changes expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. When a callable has both flow-through and flow to a sink without flow-through and the difference is dependent on the access path (and not just if it is empty or not), then we generate slightly more nodes, since we can no longer determine with the same level of precision whether we need to remember a summary context or not. To pinpoint it in the code, this is due to the change to PrevStage::parameterMayFlowThrough using an emptyAp boolean instead of ApApprox in its use in callEdgeArgParamRestricted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know that this is expected.

@aschackmull aschackmull merged commit 066cfa3 into github:main Dec 11, 2024
35 checks passed
@aschackmull aschackmull deleted the dataflow/simplify-apapprox3 branch December 11, 2024 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFlow Library Java no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants